Should We Have Blind Faith in Bioinformatics Software? Illustrations from the SNAP Web-Based Tool

نویسندگان

  • Sébastien Robiou-du-Pont
  • Aihua Li
  • Shanice Christie
  • Zahra N. Sohani
  • David Meyre
چکیده

Bioinformatics tools have gained popularity in biology but little is known about their validity. We aimed to assess the early contribution of 415 single nucleotide polymorphisms (SNPs) associated with eight cardio-metabolic traits at the genome-wide significance level in adults in the Family Atherosclerosis Monitoring In earLY Life (FAMILY) birth cohort. We used the popular web-based tool SNAP to assess the availability of the 415 SNPs in the Illumina Cardio-Metabochip genotyped in the FAMILY study participants. We then compared the SNAP output with the Cardio-Metabochip file provided by Illumina using chromosome and chromosomal positions of SNPs from NCBI Human Genome Browser (Genome Reference Consortium Human Build 37). With the HapMap 3 release 2 reference, 201 out of 415 SNPs were reported as missing in the Cardio-Metabochip by the SNAP output. However, the Cardio-Metabochip file revealed that 152 of these 201 SNPs were in fact present in the Cardio-Metabochip array (false negative rate of 36.6%). With the more recent 1000 Genomes Project release, we found a false-negative rate of 17.6% by comparing the outputs of SNAP and the Illumina product file. We did not find any 'false positive' SNPs (SNPs specified as available in the Cardio-Metabochip by SNAP, but not by the Cardio-Metabochip Illumina file). The Cohen's Kappa coefficient, which calculates the percentage of agreement between both methods, indicated that the validity of SNAP was fair to moderate depending on the reference used (the HapMap 3 or 1000 Genomes). In conclusion, we demonstrate that the SNAP outputs for the Cardio-Metabochip are invalid. This study illustrates the importance of systematically assessing the validity of bioinformatics tools in an independent manner. We propose a series of guidelines to improve practices in the fast-moving field of bioinformatics software implementation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap

SUMMARY The interpretation of genome-wide association results is confounded by linkage disequilibrium between nearby alleles. We have developed a flexible bioinformatics query tool for single-nucleotide polymorphisms (SNPs) to identify and to annotate nearby SNPs in linkage disequilibrium (proxies) based on HapMap. By offering functionality to generate graphical plots for these data, the SNAP s...

متن کامل

SNAP: Combine and Map modules for multilocus population genetic analysis

We have added two software tools to our Suite of Nucleotide Analysis Programs (SNAP) for working with DNA sequences sampled from populations. SNAP Map collapses DNA sequence data into unique haplotypes, extracts variable sites and manipulates output into multiple formats for input into existing software packages for evolutionary analyses. Map collapses DNA sequence data into unique haplotypes, ...

متن کامل

Mobyle SNAP Workbench: a web-based analysis portal for population genetics and evolutionary genomics

SUMMARY Previously we developed the stand-alone SNAP Workbench toolkit that integrated a wide array of bioinformatics tools for phylogenetic and population genetic analyses. We have now developed a web-based portal front-end, using the Mobyle portal framework, which executes all of the programs available in the stand-alone SNAP Workbench toolkit on a high-performance Linux cluster. Additionally...

متن کامل

Expert Discovery: A web mining approach

Expert discovery is a quest in search of finding an answer to a question: “Who is the best expert of a specific subject in a particular domain within peculiar array of parameters?” Expert with domain knowledge in any field is crucial for consulting in industry, academia and scientific community. Aim of this study is to address the issues for expert-finding task in real-world community. Collabor...

متن کامل

Prediction of Toxin-Antitoxin system (TA system) as a Novel Potent Target in Salmonella typhi Using Bioinformatics Analysis

Background and Objective: Salmonella typhi is one of the major challenges for the human and animal health. Salmonella with high pathogenicity can be harmful factor for human health. The control of this pathogen is a big challenge as it can cause serious infectious diseases such as gastroenteritis, septicemia and typhoid fever. On the other side, there are many factors such as toxin-antitoxin (T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2015